Goto

Collaborating Authors

 selective classification






MaskTune: MitigatingSpuriousCorrelationsby ForcingtoExplore

Neural Information Processing Systems

This workproposesMaskTune, a masking strategy that prevents over-reliance on spurious (or a limited number of) features. MaskTuneforces the trained model to explore new features during asingleepochfinetuning bymasking previously discoveredfeatures.MaskTune, unlike earlier approaches for mitigating shortcut learning, does not require any supervision, suchasannotating spurious features orlabels forsubgroup samples in a dataset.


Hierarchical Selective Classification

Neural Information Processing Systems

This paper introduces, extending selective classification to a hierarchical setting. Our approach leverages the inherent structure of class relationships, enabling models to reduce the specificity of their predictions when faced with uncertainty. In this paper, we first formalize hierarchical risk and coverage, and introduce hierarchical risk-coverage curves. Next, we develop algorithms for hierarchical selective classification (which we refer to as inference rules), and propose an efficient algorithm that guarantees a target accuracy constraint with high probability. Lastly, we conduct extensive empirical studies on over a thousand ImageNet classifiers, revealing that training regimes such as CLIP, pretraining on ImageNet21k and knowledge distillation boost hierarchical selective performance.


Top-Ambiguity Samples Matter: Understanding Why Deep Ensemble Works in Selective Classification

Neural Information Processing Systems

Selective classification allows a machine learning model to reject some hard inputs and thus improve the reliability of its predictions. In this area, the ensemble method is powerful in practice, but there has been no solid analysis on why the ensemble method works. Inspired by an interesting empirical result that the improvement of the ensemble largely comes from top-ambiguity samples where its member models diverge, we prove that, based on some assumptions, the ensemble has a lower selective risk than the member model for any coverage within a range. The proof is nontrivial since the selective risk is a non-convex function of the model prediction. The assumptions and the theoretical results are supported by systematic experiments on both computer vision and natural language processing tasks.


Selective Classification for Deep Neural Networks

Neural Information Processing Systems

Selective classification techniques (also known as reject option) have not yet been considered in the context of deep neural networks (DNNs). These techniques can potentially significantly improve DNNs prediction performance by trading-off coverage. In this paper we propose a method to construct a selective classifier given a trained neural network. Our method allows a user to set a desired risk level. At test time, the classifier rejects instances as needed, to grant the desired risk (with high probability). Empirical results over CIFAR and ImageNet convincingly demonstrate the viability of our method, which opens up possibilities to operate DNNs in mission-critical applications. For example, using our method an unprecedented 2% error in top-5 ImageNet classification can be guaranteed with probability 99.9%, with almost 60% test coverage.